Making Indian Language Legacy Documents Accessible Via Web
نویسندگان
چکیده
The reliable optical character recognition is not available for scripts of Indian languages. Thus, the only way to make legacy documents in Indian languages available on the web is by scanning them. This work is an attempt to cater to the need for a better representation and efficient storage technique for Indian language documents and their near perfect regeneration at the browser. We work with the segments (corresponding to text, image or white spaces) extracted from the original document page. For compressing the segments separately, we use Shape-Adaptive Wavelet based coding scheme, Run Length encoding and Arithmetic Bit-plane coding. An XML representation scheme is being used to represent the document page and the data is stored at a server. A plug-in has been implemented that decodes the data encoded coming from the server and displays the document page on the web browser thereby making the document pages web accessible. keywords: document image analysis, shape adaptive compression, entropy based quantization, eBooks
منابع مشابه
Making Legacy Data Accessible for Xml Applications
This paper presents design and implementation of DB2XML, a tool for transforming data from relational databases into XML documents. Document type declarations (DTDs) are generated describing the characteristics of the data making the documents self contained and usable as a data exchange format. DB2XML is written in Java and accesses databases through JDBC drivers. It can be used as a standalon...
متن کاملDevising Interactive Access Techniques for Indian Language Document Images
A large volume of legacy documents in Indian languages exist only in paper form. Web based interactive access techniques for images of these documents can ensure wider dissemination and easy availability. In this paper, we have proposed an access mechanism based on word based indexing and personalized annotation. The word based indexing scheme exploits typical structural characteristics of Indi...
متن کاملReverse Engineering Interaction Plans for Legacy Interface Migration
Legacy interface migration is becoming an increasingly important IT activity; many organizations are interested in cost effective and low risk processes for making their legacy systems accessible to new, webbased platforms. Most migration techniques proposed to date require a lot of human expertise. In this paper we discuss Mathaino, an intelligent, multi platform, semi-automated, and low risk ...
متن کاملRetrieval of Legal Documents: Combining Structured and Unstructured Information
Legal information is often accessible via portal web sites. Legal documents typically combine structured and unstructured information, the former being tagged with markup languages such as XML (Extensible Markup Language). Current information retrieval research takes into account the structured information content of documents when computing the relevance ranking. Such an approach is very promi...
متن کاملFRBR-ML: A FRBR-based framework for semantic interoperability
Metadata related to cultural items such as literature, music and movies is a valuable resource that is currently exploited in many applications and services based on semantic web technologies. A vast amount of such information has been created by memory institutions in the last decades using different standard or ad hoc schemas, and a main challenge is to make this legacy data accessible as reu...
متن کامل